Multi-faceted aspects of chunking enable robust algorithms 1 Running head: ALGORITHM FOR CHUNKING INFERENCE Multi-faceted aspects of chunking enable robust algorithms
نویسندگان
چکیده
Sequence production tasks are a standard tool to analyze motor learning, consolidation, and habituation. As sequences are learned, movements are typically grouped into subsets or chunks. For example, most Americans memorize telephone numbers in two chunks of 3 digits, and one chunk of 4. Studies generally use response times or error rates to estimate how subjects chunk, and these estimates are often related to physiological data. Here we show that chunking is simultaneously reflected in reaction times, errors, and their correlations. This multimodal structure enables us to propose a Bayesian algorithm that better estimate chunks while avoiding over-fitting. Our algorithm reveals previously unknown behavioral structure, such as an increased error correlations with training, and promises a useful tool for the characterization of many forms of sequential motor behavior. Multi-faceted aspects of chunking enable robust algorithms 3 Multi-faceted aspects of chunking enable robust algorithms Introduction One of the central questions in neuroscience, and cognitive and behavioral psychology is how the brain allows movement performance to improve with practice (Thorndike, 1898; Pavlov, 1927; Crossman, 1959; A. Newell & Rosenbloom, 1981; Skinner, 1938). When learning to produce sequences of actions, animals organize information into groups or “chunks” (Miller & George, 1956; K. M. Newell, 1991) and studying such chunks is a popular way of studying learning and memory. One common experimental paradigm to study movement chunking is the discrete sequence production task (DSP), where subjects learn, through many repetitions, to rapidly generate a fixed sequence of finger movements (Adams, 1984; Gentner, 1987; Verwey, 1996; Verwey & Dronkert, 1996; Logan & Bundesen, 2003). Understanding how practice leads to improved performance in this task promises to unveil how the central nervous system organizes temporal information in a way that enables fast, habitual processing (e.g., see Clerget et al., 2012). Indeed, evidence suggests that chunking is what allows efficient behavior. Performance gains in the DSP task correlate with motor chunking (Verwey, 1994, 1996; Verwey & Dronkert, 1996). This may be because long sequences cannot be stored in short term motor memory, or the optimal control problem for long sequences is generally infeasible (Todorov et al., 2005; Parr, 1998). Understanding motor chunking is crucial to understanding the organization of motor memory and movement efficiency. Motor chunks are known to leave characteristic traces on observed response times. One of the prominent features is a change in response times and errors at the beginning of chunks. There is a longer-than-usual pause during chunk concatenation (Verwey & Dronkert, 1996; Verwey, 1996). These differences in response times are frequently described and used in algorithms that reveal chunking behavior. It seems natural to think of each chunk as controlled by a single neural representation. And each representation should be able to produce these movements at the right speed, leading to additional predictions of features of chunking. When a new chunk is started, the transition should slow response times, and errors in the transition should lead to errors in the execution. However, we should also expect that each chunk is produced at a relatively fixed speed (Abrahamse et al., 2013). Reaction times and errors within a chunk should therefore be correlated, forming a novel hypothesis to test. There exists a range of methods for inferring chunking structure. The most common is to look at the mean response times during a set of trials and detect significant increases in certain elements of the sequence (Abrahamse et al., 2013; Verwey, 1996; Verwey & Dronkert, 1996; Verwey, 1994). These points of significant increase are marked as the start of a new chunk. This Multi-faceted aspects of chunking enable robust algorithms 4 simple method limits the analysis by requiring the experimenter to choose which ranges of trials to analyze. A recent approach based on community detection in networks (Wymbs et al., 2012) removes this limitation by modeling time explicitly and providing time varying evolution of chunk structures. These methods, however, cannot readily incoporate multiple signals (reponse times and errors) or deal with correlations of these signals within trials. In this article, we analyze data from seventeen subjects who participated in a DSP task over an average of 30 days of practice. We do find that the features reaction time, error, and their respective correlations are associated with chunking structure. Combining information across features and time allows us to construct a Bayesian algorithm that is consistently more precise than algorithms using only one feature. To verify the general applicability of our algorithm, we also analyze two datasets from non-human primates producing and learning sequences over months or even years. The resulting algorithm that combines the multimodal features of behavior and across time allows precise estimates of underlying chunking structure. Methods Experimental data We analyze data where humans or monkeys participate in a discrete sequence production (DSP) task. Subjects observe a cue that signals the next element of a sequence of elements to be executed. Each element of the sequence is signaled to the subject one at a time and after inputing the right element, the system advances to the next element, and so on until the end of the sequence (Figure 2A), which allows subjects to speed up the task by predicting the next element. Human data. We reanalyze data from a published study (Bassett et al., 2013). In short, twenty five subjects completed a training regimen involving the simultaneous acquisition of 6 different 10-element motor sequences using a discrete sequence production (DSP) task. Subjects were asked to input the elements as fast and accurate as possible. Over the course of a 6 week training regimen, subjects trained on the sequences both at home using their laptop computer as well as inside an MRI scanner. Training alternated between scanner and home locations, such that following each training session in the scanner, subjects performed a minimum of 10 sessions (1 session/day) at home over the next 2 weeks. This pattern of training repeated 3 times, so that subjects completed on average 30 home training sessions and 4 scan sessions. We only analyze the home training sessions. Sequence familiarity was manipulated during home training at three exposure levels. Each home training session consisted of 150 trials presented using a random schedule, so that two sequences trained extensively (around 2000 trials), two sequences trained moderately, and two sequences trained minimally. All subjects trained on the same sequence set and each at the same exposure level, which were maintained over the course of training. Only the highly trained sequences were analyzed in the present study. Sequence one is 2, 1, 3, 5, 4, 1, 3, 4, Multi-faceted aspects of chunking enable robust algorithms 5 2, and 5, and sequence two is 4, 2, 1, 3, 5, 2, 3, 1, 4, and 5. Fingers are sequentially numbered from (1) the thumb to (5) the pinky. We excluded 8 subjects because either lower participation rate, increase response times or error rates with practice. Monkey data. We reanalyze data from a published study (Matsuzaka et al., 2007, dataset D1): In short, two rhesus macaques participated in a 12-element 5-target discrete sequence production task. We analyze data from one monkey. Targets for the reaching movements were on a touch screen, chosen by touching the screen with the hand, and visually cued 300 ms after the previous target contact. However, the task allowed the monkeys to contact the next target in the sequence during the 300 ms delay before it was shown. When the monkeys made correct anticipatory responses, the task was incremented to the next element of the sequence without display of the touched target. As a result, the monkeys learned to perform the sequence in a predictive fashion without requiring visual cues. We reanalyze a second dataset from a previous study (Desmurget & Turner, 2010, dataset D2): In this task, two rhesus macaques participated in a 5-element 8-target discrete sequence production task (see also Desmurget & Turner, 2008, 2010). Targets for the movements were displayed on a monitor, chosen by steering a cursor through a target area with a joystick, and visually cued 230 ms after the previous target acquisition. The task required the animal to move the cursor in five consecutive out and back movements between the central start position and a series of five peripheral targets. As in dataset D1, with training, the monkey performed the task in a predictive fashion without relying on visual cues. Preprocessing The aim of this study is to understand the evolution of chunking structure over time. For this purpose, it is necessary to control for the effect of practice on overall movement speed that is unrelated to the within-sequence variability introduced by chunking. For example, by computing the mean response time as a function of trials, we can clearly see a trend that comes from practicing (Figure 2B). In our analysis, we remove this unrelated effect by performing a detrending of the response time data using an exponential model (Logan & Bundesen, 2003). There are alternative models for detrending practice curves, such as the power law, and depending on the cirscumstance different curves might be used (Heathcote et al., 2000). We assume that the practice component of response time depends on trials across days and the interaction between elements and trial RTt,i = a0 + a1 · exp(−b1 · t−b2 · t · i) } {{ } Practice effect + RRTt,i } {{ } Chunking + noise , (1) where t is the trial, i is the element. We control for the interaction of trial and element because Multi-faceted aspects of chunking enable robust algorithms 6 response time tends to be faster in later elements of the sequence, independent of the chunking structure (Abrahamse et al., 2013). Other covariates assumed to be independent of the chunking structure could be included in the detrending preprocesing, such as finger used RTt,i = a0 + a1 · exp(−b1 · t−b2 · t · i) } {{ } Practice effect + 5 ∑ k=2 fingerk } {{ } Biomechanical effect + RRTt,i } {{ } Chunking + noise (2) The residual response time, RRT, then, is assumed to come from the joint contribution of chunking structure and the motor variability (noise). This allows us to use the reaction times as a meaningful way of estimating chunking structure. Generative model for residual response times (RRT) and errors Our Bayesian algorithm is based on a so-called generative model, a probabilistic description of how statistical properties of the task—the chunking structure—are reflected in the statistical properties of measurable data—RTs and errors. In our modeling, ’error’ refers to the failure of a participant to correctly generate an element and it is therefore an observation of the behavior rather than a statistical estimation error. Here we assume that the chunking structure tends to stay similar from trial to trial but may change a great deal over many trials. Based on preliminary analysis described below, we build into the model the assumption that the first element of each chunk may have prolonged RRT and raised error probabilities and that, within each chunk, response times and error locations are correlated. We thus can define a probabilistic graphical model (Jordan, 2004) of how the data are stochastically related within a trial and across trials. The residual response times depend on the parameters of the features and the chunking structure at trial t. The goal of our algorithm is to estimate these parameters. We assume that there is a chunking structure, ct , which slowly evolves across trials. Statistically, this is captured by the following generative model: c0 ∼ π, (3) ct | ct−1 ∼ Π(ct−1 → ·), (4) RRTt | ct ∼ Normal ( μRRT ct ,Σ RRT ct ) , (5) ERt | ct ∼ Normal(μER ct ,ΣER ct ). (6) Our model is a hidden Markov model (HMM) with Gaussian emissions (Murphy, 2012). There is a finite number of chunking structures (c ∈ {1, . . . ,K}), and the matrix Π(c→ c′) governs the probability of going from chunking structure c to c′. The distribution π is the probability of starting at any of the K chunking structures. Conditioned on the chunking structure, multivariate Multi-faceted aspects of chunking enable robust algorithms 7 normal distributions are assumed for residual response times (RRT) and errors (eqs. 5 and 6). Modeling errors as normally distributed can only capture their means and covariance but not their skewness, but we made this modeling choice for statistical simplicity and computational efficiency. More specifically, the mean residual response time at trial t for element i is E[RRTt,i | ct ] = ⎧⎨ ⎩ μRRT start i is start of chunk μRRT nonstart otherwise (7) and, similarly, the mean error at trial t for element i is E[ERt,i | ct ] = ⎧⎨ ⎩ μER start i is start of chunk μER nonstart otherwise (8) The variance across elements is assumed to be homogenous over chunks, i.e. Var[RRTt,i | ct ] = σ2 RRT and Var[ERt,i | ct ] = σ2 ER. The covariance takes on a simple structure, assumed constant within a chunk and zero across chunks. Because a constant variance is assumed as well, the correlation within a chunk is constant, and the correlation across chunks is null. Mathematically, the correlations in residual response times and errors at trial t between elements i and j are Corr[RRTt,i,RRTt, j | ci] = ⎧⎨ ⎩ ρRRT i and j are in same chunk 0 otherwise (9) and Corr[ERt,i,ERt, j | ci] = ⎧⎨ ⎩ ρER i and j are in same chunk 0 otherwise (10) To infer the chunking structure, we must simultaneously learn the parameters π,Π,μRRT start ,μRRT nonstart,μ start,μ nonstart,σ RRT,σ ER,ρRRT, and ρER and compute the distribution over chunking structures ( see Table 1). We use a standard expectation maximization scheme in which we alternate between finding the expected chunking structure of the model through a forward-backward algorithm (Welch, 2003) and then finding the parameters that most likely would have generated those chunking structures. This scheme is proven to find a local maximum on log-likelihood (Murphy, 2012). The algorithm optimally estimates an interpretation of the data (RTs, errors, and correlations) in terms of a slowly varying chunking structure. The code of the algorithm is available at https://github.com/daniel-acuna/chunk_inference Multi-faceted aspects of chunking enable robust algorithms 8 Generation of candidate chunking structures A candidate set of chunking structures must be provided to our method. In this study, we tested all combinations of chunking structures with 1 or more elements. For example, for a 10-element sequence, there are 511 possible chunking structures, from the hypothetical that each element is in its own chunk to all 10 elements being in one chunk (Figure 1A). Each chunking structure defines mean and covariance RRTs and errors, as defined by eqs. 7, 8, 9, and 10 (for concrete examples, see Fig. 1B). After the list of possible chunking structures is defined, the algorithm estimates the parameters of the hidden Markov model, inferring the distribution of chunking structure present at each trial. Alternatives to the full model and model comparison measures The model presented in the previous sections uses four features of the data to infer chunking structure. These features are: mean RRT and errors, and RRT/RRT and error/error correlations. By enabling or disabling each of these features, it is possible to fit simpler variations of the full model. In this study, we test 3 variations of the model: 1. RT + ER: Full model with correlations set to zero (ρRRT = 0,ρER = 0) 2. RT + ER + RRT/RRT: Full model with error correlations set to zero (ρER = 0) 3. RT + ER + ER/ER: Full model with RRT correlations set to zero (ρRRT = 0) Model comparison will be performed by fitting the parameters needed for each model in a training dataset and reporting the prediction performance of the model in a separate testing dataset. This procedure is a standard technique in machine learning where models of diferrying complexity need to be compared (Hastie et al., 2009). Specifically, if a model is too complex for a dataset, it fits the training data very well but it performs poorly during the prediction of the testing dataset due to overfitting. Similarly, if a model is excesively simple, it poorly fits the training and testing datasets. The model comparison method used here will favor models that capture real trends in the data rather than the noise. Results We want to understand how chunking affects behavior and, based on that knowledge, build an algorithm that estimates chunks given response times, errors and correlations. We base our analysis on results from several previously published experiments that used humans and monkeys as subjects. In the human study, subjects had to type out sequences, while given visual information about the correct key to press in sequential order (Figure 2A). Similarly, in the monkey studies, subjects had to reach over a screen or use a joystick to select the correct element in the sequence. Over the course of extensive training, subjects learned to predict each element, and, by the end of the training, response times were very fast, indicating predictive behavior (~150 ms). We Multi-faceted aspects of chunking enable robust algorithms 9 characterize the statistics of the multi-modal structure of this data and use it to calibrate an algorithm that efficiently estimates chunking structure. In the human dataset, subjects get better over time (Figure 2B), independent of the actual chunking structure. The exponential fit (1) reveals that parameters across subjects and sequences were similar, suggesting that practice had a similar effect on response time across subjects and sequences. Subjects clearly learn to become more efficient, but this effect is orthogonal to the way they chunk the sequence. The raw data shows interesting structure that is consistent with the existence of chunks (Fig. 2C). We clearly see that some response times tend to be longer throughout the experiment (e.g. elements 1, 5, and 6) while others are only longer during parts of the experiment (e.g. element 7 from trial 400 to 1,400). However, chunking structure based on response time is salient early in learning and decays in importance later. Errors also follow a similar but substantially more subtle trend. Importantly the increased error probabilities and reaction times are temporally coherent: if the time is longer on one trial it tends to be longer on the next trial. These are signs of chunking structure and its slow evolution over time. As we hypothesized, there may be structure in the data that goes beyond mean RRTs and error probabilities. We find clear evidence of chunking structure in the RRT/RRT and error/error correlations for this example subject (Figure 2C, lower panels). Empirically, we found that these four features (mean RRT and error, and RRT/RRT, and error/error correlations) are positively associated with one another, suggesting that they represent one underlying structure. Reaction times are correlated to the probability of making an error. But if long reaction times indicate the start of a chunk, then the next two items will usually be part of the same chunk and should have correlated reaction times. And as predicted, we find a correlation of reaction times with the correlation of reaction times with the next item. And in the same manner we find the expected relation to the error error correlations. Similarly, we find the correlation of errors and error-error correlations (see Figure 2D for all associations across these four features). Chunking is not just a phenomenon of reaction times but is reflected in the full set of hypothesized features. Our model is now driven by the insights from the data (Fig 2D): (1) Reaction times and errors, as well as their correlations across elements are indicators of chunking structure. (2) Chunking changes slowly over time. These four features we use are complimentary: RTs are powerful features early in learning but later they require significant integration over time while errors and correlational information provides a constant secondary source of information, in particular later in learning. Combining these insights allows our model to estimate the chunking structure. Do RTs and errors inform us about the same underlying chunking structure? To ask this question we ask how the models based on only RT or errors predicts the other. We find that when estimating chunking based on errors only, response times at chunk boundaries are significantly Multi-faceted aspects of chunking enable robust algorithms 10 longer than within chunks (Figure 3A, top panels). We also find that when estimating chunking based on reaction times only, chunk starts have more errors (Figure 3A, lower panels). This suggests that the various features are all reflections of one underlying chunk structure. We additionally tested the applicability of these ideas to very long learning studies. We use two datasets from two laboratories that involved macaques performing discrete sequence production tasks. The steps used to analyze the data were analogous to those used in the human experiment previously described. For one dataset (D1), from Matsuzaka et al. (2007), we analyzed response times and errors, and the second data set (D2), from Desmurget & Turner (2010), we only analyzed errors. Dataset D1 contains around 120,000 trials of 12-element 5-target sequences from one monkey, and Dataset D2 totals around 138,000 trials of 5-element 8-target sequences from two monkeys. The extent and controllability achieved by these experiments make them ideal for testing the introduced algorithm. Indeed, we find that for all datasets, when we train on the first 80% of the data to predict the last 20% of the data, we are better than a no-chunking algorithm (Fig. 3, B and C). These results show that our algorithm is able to describe meaningful aspects of sequence production tasks over long time scales. A more detailed analysis of the human data with the full model is performed next. The algorithm provides us with time-varying fits to RTs, errors, and correlations. The model can provide the most probable chunking structure at each trial (see example inference in Figure 4B) and qualitatively fits the response times and errors, exhibiting a block-diagonal correlational structure consistent with the data (compare Figure 2C with Figure 4C). Over the course of learning, the nervous system may use chunks of varying size. Indeed, we see that smaller chunks are replaced by larger ones (Fig. 5A), consistent with the literature (Abrahamse et al., 2013). It seems that subjects encode sequences by using many chunks at the beginning of learning and then slowly “compiling” them into longer chunks as had been suggested by the prior literature (K. M. Newell, 1991; Ericsson, 2006; Wymbs et al., 2012). The algorithm uses whichever features are indicative about the chunking structure which may evolve over time. Response times and errors at chunking boundaries evolves over time (Fig. 5B). RT/RT correlation remain relatively constant throughout the experiment and, interestingly, error-error correlations increase over time (Fig. 5C). Across learning our algorithm identifies chunks using reaction times, errors, and their correlations, and how chunks are expressed evolves over time. Discussion In this work, we have analyzed reaction times, errors, and their correlations and found all of them to be indicative of an underlying chunking structure. Interestingly, signs of chunking structure change over time. The importance of the response time signal is larger than errors, and error correlations become increasingly important at the end of the trials. We also found chunking Multi-faceted aspects of chunking enable robust algorithms 11 structure to evolve slowly over time, with more chunks at the beginning and fewer chunks at the end. Combining these properties into a joint model for chunking inference enabled an algorithm that uses the breadth of available data. Our algorithm consistently outperforms simpler algorithms across datasets from humans and monkeys. Our algorithm promises to help us better understand how motor memory is organized in sequence production tasks and how the central nervous system improves performance through practice. Any model needs to make simplifying assumptions and we want to discuss their effect. First, there could be different ways in which the central nervous system structures chunking. For example, we have assumed that, at any point during training, there is only one underlying chunking structure that we need to uncover. In principle, different brain regions could chunk differently, and chunking in one brain area could affect reaction times while the other affects error probabilities. However, our finding that RTs can predict errors and vice versa suggests that at least some of the chunking structure is shared. The data we analyzed comes from experiments with several interleaved sequences, which is ignored by our analysis. We believe, however, that the interaction across sequences does not strongly affect our results. First, the parameters obtained during the detrending suggested that the effect of learning is essentially equal across subjects and sequences. And second, the sequences were originally created so that they do not have regularities (e.g., 121, 123, 11), lowering the potential for across-sequence interactions. Interference across sequences, however, could be potentially added by considering higher-order hidden Markov models or hidden semi-Markov Models (Murphy & Paskin, 2002). It seems unlikely that the structures we uncovered were produced by the interleaved sequences. Previous studies have examined possible brain substrates of chunking behavior in the discrete sequence production task. Distinct areas have been found to correlate with chunk splitting (left-hemisphere front parietal network) and chunk concatenation (bilateral sensorimotor putamen) (Wymbs et al., 2012). Other studies have related the ability of humans to learn with the “flexibility” they have to change the representation of motor sequences (Bassett et al., 2011, 2013). Taken together, these results suggest that the sequence production tasks are a powerful, yet simple paradigm to study learning and skill acquisition and its relationship to the brain. A tool like the one presented here can be used to pinpoint more precisely individual differences in learning performance as a result of improved chunking inference.A potential extension could penalize chunking structures that have either too few or too many chunks (Verwey, 1994). This modeling assumption would add to the intuition that remembering many chunks adds to the cognitive load and therefore is less likely to occur (e.g., Verwey, 1996) , and also having one long chunk is unlikely because it would require the motor system to store the entire sequence in one chunk of memory (Sternberg et al., 1978). In our algorithm, this notion could be added by regularizing the initial distribution (equation 3) with an appropriate prior that penalizes too few or too many chunks. Multi-faceted aspects of chunking enable robust algorithms 12 We have introduced a coherent and statistically meaningful approach for dealing with chunking inference in sequence production tasks. While alternative methods can provide estimates of the chunking structure (e.g., Wymbs et al., 2012; Bassett et al., 2013; Mucha et al., 2010; Fortunato, 2010), they start from a time-invariant evolution of chunking structure and do not deal with multiple features and correlations. Our algorithm produce estimates for each trial, by design, while other approaches provide an estimate for a window of trials whose size needs to be provided (Verwey, 1996). Also, since our method is a fully generative model of how data is produced, it can gracefully deal with missing data, and it could be extended to provide estimates and predictions of any random variable with Bayesian credible intervals (“errors bars”). The fundamental ideas used in our algorithm also provide a framework to understand other motor learning tasks beyond discrete sequence production. For example, our algorithm is amenable to include hierarchical and non-Markovian probabilistic structures (e.g., see Heller et al., 2009). Our algorithm runs relatively fast (analyzing 2000 trials requires around a minute on an ordinary desktop computer), and can be easily brought into the analysis of more complex tasks that involve varying, continuous motor command execution (e.g., grasping or reaching). Similar to the work that has been done in DSP tasks, our algorithm may serve to improve understanding of how motor memory is consolidated in general, by using the same probabilistic principles we used here. References Abrahamse, E. L., Ruitenberg, M. F. L., de Kleine, E., & Verwey, W. B. (2013). Control of automated behavior: insights from the discrete sequence production task. Frontiers in Human Neuroscience, 7. doi: 10.3389/fnhum.2013.00082 Adams, J. A. (1984). Learning of movement sequences. Psychological Bulletin, 96(1), 3. doi: 10.1037/0033-2909.96.1.3 Bassett, D. S., Wymbs, N. F., Porter, M. A., Mucha, P. J., Carlson, J. M., & Grafton, S. T. (2011). Dynamic reconfiguration of human brain networks during learning. Proceedings of the National Academy of Sciences, 108(18), 7641. doi: 10.1073/pnas.1018985108 Bassett, D. S., Wymbs, N. F., Rombach, M. P., Porter, M. A., Mucha, P. J., & Grafton, S. T. (2013). Task-based core-periphery organization of human brain dynamics. PLoS Computational Biology, In press. Clerget, E., Poncin, W., Fadiga, L., & Olivier, E. (2012). Role of broca’s area in implicit motor skill learning: Evidence from continuous theta-burst magnetic stimulation. Journal of Cognitive Neuroscience, 24(1), 80-92. Crossman, E. R. F. W. (1959). A theory of the acquisition of speed-skill. Ergonomics, 2, 153-166. doi: 10.1080/00140135908930419 Multi-faceted aspects of chunking enable robust algorithms 13 Desmurget, M., & Turner, R. S. (2008). Testing basal ganglia motor functions through reversible inactivations in the posterior internal globus pallidus. Journal of Neurophysiology, 99(3), 1057-1076. doi: 10.1152/jn.01010.2007 Desmurget, M., & Turner, R. S. (2010). Motor sequences and the basal ganglia: Kinematics, not habits. The Journal of Neuroscience, 30(22), 7685-7690. doi: 10.1523/JNEUROSCI.0163-10.2010 Ericsson, K. A. (2006). The influence of experience and deliberate practice on the development of superior expert performance. In K. A. Ericsson, N. Charness, & P. Feltovich (Eds.), The cambridge handbook of expertise and expert performance (p. 685-706). Cambridge ; New York: Cambridge University Press. Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3–5), 75 174. doi: 10.1016/j.physrep.2009.11.002 Gentner, D. R. (1987). Timing of skilled motor-performance tests of the proportional duration model. Psychological Review, 94(2), 255-276. doi: 10.1037//0033-295X.94.2.255 Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning. New York, NY: Springer. Heathcote, A., Brown, S., & Mewhort, D. (2000). The power law repealed: The case for an exponential law of practice. Psychonomic Bulletin & Review, 7(2), 185–207. Heller, K. A., Teh, Y. W., & Gorur, D. (2009). Infinite hierarchical hidden markov models. Proceedings of the 12th International Confe-rence on Artificial Intelligence and Statistics (AISTATS). Jordan, M. I. (2004). Graphical models. Statistical Science, 140-155. Logan, G. D., & Bundesen, C. (2003). Clever homunculus: Is there an endogenous act of control in the explicit task-cuing procedure? Journal of Experimental Psychology: Human Perception and Performance, 29(3), 575. doi: 10.1037/0096-1523.29.3.575 Matsuzaka, Y., Picard, N., & Strick, P. (2007). Skill representation in the primary motor cortex after long-term practice. J. Neurophysiol.. Miller, & George, A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review, 63(2), 81. doi: 10.1037/h0043158 Mucha, P. J., Richardson, T., Macon, K., Porter, M. A., & Onnela, J.-P. P. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science, 328(5980), 876-8. doi: 10.1126/science.1184819 Multi-faceted aspects of chunking enable robust algorithms 14 Murphy, K. P. (2012). Machine learning :a probabilistic perspective. Cambridge, MA: MIT Press. Murphy, K. P., & Paskin, M. A. (2002). Linear-time inference in hierarchical hmms. Advances in Neural Information Processing Systems, 2, 833-840. Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition and the law of practice. In Cognitive skills and their acquisition (p. 1-55). Newell, K. M. (1991). Motor skill acquisition. Annu Rev Psychol, 42, 213-37. doi: 10.1146/annurev.ps.42.020191.001241 Parr, R. E. (1998). Hierarchical control and learning for markov decision processes. Unpublished doctoral dissertation, University of California, Berkeley. Pavlov, I. (1927). Conditioning reflexes. New York: Oxford University Press. Skinner, B. F. (1938). The behavior of organisms: an experimental analysis (illustrated, reprint ed.). D. Appleton-Century Company. Sternberg, S., Monsell, S., Knoll, R. L., & Wright, C. E. (1978). Information processing in motor control and learning. In (p. 117-152). Thorndike, E. L. (1898). Animal intelligence: An experimental study of the association processes in animals. Psychological Review Monograph, 2(8). Todorov, E., Li, W., & Pan, X. (2005). From task parameters to motor synergies: A hierarchical framework for approximately optimal control of redundant manipulators. Journal of Robotic Systems, 22(11), 691-710. doi: 10.1002/rob.20093 Verwey, W. B. (1994). Evidence for the development of concurrent processing in a sequential keypressing task. Acta Psychologica, 85(3), 245-262. Verwey, W. B. (1996). Buffer loading and chunking in sequential keypressing. Journal of Experimental Psychology: Human Perception and Performance, 22(3), 544. Verwey, W. B., & Dronkert, Y. (1996). Practicing a structured continuous key-pressing task: Motor chunking or rhythm consolidation? Journal of Motor Behavior, 28(1), 71–79. doi: 10.1080/00222895.1996.9941735 Welch, L. R. (2003). Hidden markov models and the baum-welch algorithm. IEEE Information Theory Society Newsletter, 53(4), 1-14. Wymbs, N., Bassett, D., Mucha, P., Porter, M., & Grafton, S. (2012). Differential recruitment of the sensorimotor putamen and frontoparietal cortex during motor chunking in humans. Neuron, 74(5), 936 946. doi: 10.1016/j.neuron.2012.03.038 Multi-faceted aspects of chunking enable robust algorithms 15 Author Note Supported by PHS Program Project Grant NS44393 to K.K., P.L.S., R.S.T. and S.T.G, and NS074044 to K.K. Multi-faceted aspects of chunking enable robust algorithms 16 Parameter Description π Initial distribution of chunking structures Π Transition probability matrix between chunking structures μRRT start Mean RRT at beginning of chunk μRRT nonstart Mean RRT within a chunk μER start, Error rate at beginning of chunk μER nonstart Error rate within a chunk σ2 RRT RRT variance σ2 ER Error rate variance ρRRT Correlation between RRT of elements within same chunk ρER Correlation between error of elements within same chunk Table 1 Parameters needed by the model to estimate chunks. These parameters are fitted by expectation maximization. Multi-faceted aspects of chunking enable robust algorithms 17
منابع مشابه
Multifaceted aspects of chunking enable robust algorithms.
Sequence production tasks are a standard tool to analyze motor learning, consolidation, and habituation. As sequences are learned, movements are typically grouped into subsets or chunks. For example, most Americans memorize telephone numbers in two chunks of three digits, and one chunk of four. Studies generally use response times or error rates to estimate how subjects chunk, and these estimat...
متن کاملA Chunking Parser for Semantic Interpretation of Spoken Route Directions in Human-Robot Dialogue
We present a novel application of the chunking parser for data-driven semantic interpretation of spoken route directions into route graphs that are useful for robot navigation. Various sets of features and machine learning algorithms were explored. The results indicate that our approach is robust to speech recognition errors, and could be easily used in other languages using simple features.
متن کاملA Data-driven Approach to Understanding Spoken Route Directions in Human-Robot Dialogue
In this paper, we present a data-driven chunking parser for automatic interpretation of spoken route directions into a route graph that is useful for robot navigation. Different sets of features and machine learning algorithms are explored. The results indicate that our approach is robust to speech recognition errors.
متن کاملExact Decoding for Jointly Labeling and Chunking Sequences
There are two decoding algorithms essential to the area of natural language processing. One is the Viterbi algorithm for linear-chain models, such as HMMs or CRFs. The other is the CKY algorithm for probabilistic context free grammars. However, tasks such as noun phrase chunking and relation extraction seem to fall between the two, neither of them being the best fit. Ideally we would like to mo...
متن کاملRemote File Synchronization: A Performance Comparision and Analysis of Different Approaches
Remote file synchronization, which is used to synchronize two versions of a file, is a primitive for a lot of network applications. The existing approaches can be divided into two categories: single round synchronization protocols and multiround synchronization protocols. Typical single round protocols include sliding-block based synchronization protocols like rsync and chunking based synchroni...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014